Skip to content

Modify ori file reader to deal with absence of EN (or not) at the end of the file#516

Open
GallegoSav wants to merge 3 commits intocositools:developfrom
GallegoSav:issue_503
Open

Modify ori file reader to deal with absence of EN (or not) at the end of the file#516
GallegoSav wants to merge 3 commits intocositools:developfrom
GallegoSav:issue_503

Conversation

@GallegoSav
Copy link
Contributor

This pr is for solving #503 .

If there is EN at the end of the ori file, dropna will remove it. If not , nothing will change

Updated data processing to drop NaN values from the dataframe.
@GallegoSav GallegoSav changed the title Modify ori file reader to deal with absence of EN (or not) at the end orf the file Modify ori file reader to deal with absence of EN (or not) at the end of the file Mar 13, 2026
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.18%. Comparing base (4911568) to head (61429a9).
⚠️ Report is 29 commits behind head on develop.

Files with missing lines Coverage Δ
cosipy/spacecraftfile/spacecraft_file.py 85.08% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jdbuhler
Copy link
Contributor

If the issue is that one particular ori file is malformed because the expected EN got dropped, why are we not fixing the file in question, instead of complicating the parser in a way that invites future bugs?

@jdbuhler
Copy link
Contributor

jdbuhler commented Mar 14, 2026

Further comments:

Of the three .ori files used in the tutorials,

  • 20280301_3_month... has a trailing EN line
  • DC3_final_...15sbins... has no trailing EN (but does have a blank line at the end, which the CSV parser ignores) and so was having the last line dropped
  • DC3_final_...1sbins... has neither a trailing EN nor a blank and so was having the last line dropped

The test case .ori file I recently extracted from DC3_final_...1sbins... follows its format and so also lacks any trailing marker. The other three test case .ori files all have an EN marker.

So before trying to fix the code, can we clarify what we consider the correct .ori format? Can MEGAlib itself ever produce an .ori file without the EN marker? If not, I would argue that we should fix the two files we are using without such markers and then have cosipy throw an error if the EN is missing. We could even expend a bit of extra time to check for the EN explicitly, rather than relying on an indirect test with the CSV parser, since .ori is no longer our primary input format now that we have FITS.

In any case, I clearly need to regenerate the FITS file for _1sbins and _15sbins so they have the last line, and also to fix that one test case .ori file.

@jdbuhler
Copy link
Contributor

Another comment on how we parse .ori files: the basic parsing call is

df = pd.read_csv(file, sep=r"\s+", skiprows=1, usecols=tuple(range(1,10)), header = None, comment = '#')

Do we actually want to support commenting out lines with '#'? Is this a feature of MEGAlib's ori files? If not, I would argue that it is more likely to cause unexpected behavior than to be useful.

@israelmcmc
Copy link
Collaborator

So before trying to fix the code, can we clarify what we consider the correct .ori format? Can MEGAlib itself ever produce an .ori file without the EN marker? If not, I would argue that we should fix the two files we are using without such markers and then have cosipy throw an error if the EN is missing. We could even expend a bit of extra time to check for the EN explicitly, rather than relying on an indirect test with the CSV parser, since .ori is no longer our primary input format now that we have FITS.

I know I was the one who suggested the workaround that @GallegoSav implemented in this PR (thanks @GallegoSav btw). I was trying to save some time, but @jdbuhler, if you are willing to fix the files (it seem you already did #517, correct?) and add a check for EN, then I agree and we can go with your solution instead.

Do we actually want to support commenting out lines with '#'? Is this a feature of MEGAlib's ori files?

I don't think MEGALib comments out lines with # for .ori files specifically (it does for other types of files) but we sometimes add information on the files with # after the fact manually (for e.g. caveats, provenance, etc.)

@jdbuhler
Copy link
Contributor

Yes, I fixed the two offending tutorial .ori files (which are now in the develop tree on wasabi). Are we good with replacing the existing files in DC3 with these?

I can work on the explicit EN check outside the framework of CSV parsing -- I'll just seek to the end of the file and look there.

@israelmcmc
Copy link
Collaborator

israelmcmc commented Mar 16, 2026

Are we good with replacing the existing files in DC3 with these?

My preference is to leave those as is, and only fix the develop and DC4 folders. For DC3 I try to do hot fixes only when absolutely necessary. In this case the error from missing the last line is pretty small, and it doesn't prevent the code from running. Otherwise we would have to change the checksum of the release as well, not just develop. I'm also tagging @ckarwin to see what he thinks.

I can work on the explicit EN check outside the framework of CSV parsing -- I'll just seek to the end of the file and look there.

Sounds good, thank you.

@jdbuhler
Copy link
Contributor

@israelmcmc and @GallegoSav , please see PR #533.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants